Adapting SProUT to processing Baltic and Slavonic languages

نویسندگان

  • Witold Drożdżyński
  • Petr Homola
  • Jakub Piskorski
  • Vytautas Zinkevičius
چکیده

This paper focuses on presenting an initial effort for porting SProUT — a novel general purpose IE platform, to processing Baltic and Slavonic languages. We describe the system, characterize the mentioned language groups and discuss the process of developing named-entity and chunk grammars for these languages, which are crucial for solving information extraction tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule-based Named-Entity Recognition for Polish

Although considerable work on namedentity recognition for English and few other major languages exists, research on this topic with regard to Slavonic languages has been almost neglected. In this paper, we present an attempt towards constructing a named-entity recognition system for Polish on top of SProUT, a novel multi-lingual NLP platform, we discuss the encountered difficulties, and present...

متن کامل

Semi-automatic Approach to Building Dictionary between Slavonic Languages

Machine translation between Slavonic languages is still in its early stages. Existence of bilingual dictionaries have big impact on quality of translation. Unfortunately creating such language resources is quite expensive. For small languages like Czech, Slovak or Slovenian is almost sure that large-enough dictionary will not be commercially successful. Slavonic languages tends to range between...

متن کامل

Gender in Slavonic from the Standpoint of a General Typology of Gender Systems

THIS paper outlines a general typology of gender systems and locates the Slavonic systems within it. There are two reasons for adopting this approach: first, it gives a new perspective on the Slavonic data; and second, it highlights those features of gender in Slavonic which are of most interest to researchers working in general linguistics. Slavonic is indeed a rich source: its gender systems ...

متن کامل

Development of multi-voice and multi-language TTS synthesizer (languages: Belarussian, Polish, Russian)

The paper describes some results of the research which aiming at filling the gap in introducing and promoting computerized speech technology for Slavonic languages, in particular, a technology of TTS synthesis for Belarusian, Polish and Russian. A typological analysis of the peculiarities of phonemic and allophonic systems of Belarussian, Polish and Russian languages is given. Based on the resu...

متن کامل

Towards Partial Word Sense Disambiguation Tools for Czech

Complex applications in natural language processing such as syntactic analysis, semantic annotation, machine translation and especially word sense disambiguation consist of several relatively simple independent tasks. Czech, belonging among Slavonic languages with many inflectional features, requires more effort for such tasks, in comparison with other languages. In this article we present two ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003